VLM Finetuning support #411

nikita-smetanin · 2025-12-22T20:28:32Z

Add support for Multimodal datasets in OpenAI-like format
Add support for Vision-Language model training with optional Vision encoder finetuning

sbassam · 2025-12-23T02:39:54Z

src/together/utils/files.py

+        elif messages_are_multimodal != is_multimodal:
+            # Due to the format limitation, we cannot mix multimodal and text only messages in the same sample.
+            raise InvalidFileFormatError(
+                "Messages in the conversation must be either all in multimodal or all intext only format.",


nit: typo: ...or all in text-only

sbassam · 2025-12-23T02:43:34Z

src/together/utils/files.py

+        message: The message to check.
+        idx: Line number in the file.


Please update these

connermanuel · 2025-12-23T06:06:19Z

tests/unit/test_finetune_resources.py

certainly dont mind this change but i wonder how it got in

connermanuel · 2025-12-23T06:12:23Z

src/together/cli/api/finetune.py


+    if model_limits.supports_vision:
+        # Don't show price estimation for multimodal models yet
+        confirm = True


sorry, i don't have context here, why does this prevent showing the price estimation?

connermanuel · 2025-12-23T06:13:47Z

src/together/resources/finetune.py


+    if model_limits.supports_vision:
+        multimodal_params = FinetuneMultimodalParams(train_vision=train_vision)
+    elif train_vision:


supernit: i prefer
elif not model_limits.supports_vision and train_vision
here. it's logically the same, but the condition is clearer

connermanuel · 2025-12-23T06:17:50Z

src/together/utils/files.py

                    line_number=idx + 1,
                    error_source="key_value",
                )
-            if not isinstance(message[column], str):


perhaps you can check isinstance(message[column], MessageContent) instead?

connermanuel · 2025-12-23T06:18:39Z

src/together/utils/files.py

 def _check_message_role(
-    message: Dict[str, str | bool], previous_role: str | None, idx: int
-) -> str | bool:
+    message: Dict[str, str | int | MessageContent], previous_role: str | None, idx: int


when is the message an int?

connermanuel · 2025-12-23T06:19:34Z

src/together/utils/files.py



+def _check_message_content(
+    message_content: str | int | MessageContent, role: str, idx: int


when is the message an int?

nikita-smetanin added 7 commits December 22, 2025 18:39

Support Multimodal datasets

b93a673

Support Multimodal datasets

a71eee3

Support Multimodal datasets

0890d35

Support Multimodal datasets

367f606

Support VLM finetuning

b026e4e

Support VLM finetuning

c93a870

Support VLM finetuning

158ae5a

nikita-smetanin requested review from connermanuel and sbassam December 22, 2025 20:28

sbassam approved these changes Dec 23, 2025

View reviewed changes

connermanuel approved these changes Dec 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

VLM Finetuning support #411

VLM Finetuning support #411

nikita-smetanin commented Dec 22, 2025

Uh oh!

sbassam Dec 23, 2025

Uh oh!

sbassam Dec 23, 2025

Uh oh!

connermanuel Dec 23, 2025

Uh oh!

connermanuel Dec 23, 2025

Uh oh!

connermanuel Dec 23, 2025

Uh oh!

connermanuel Dec 23, 2025

Uh oh!

connermanuel Dec 23, 2025

Uh oh!

connermanuel Dec 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants



		def _check_message_content(
		message_content: str \| int \| MessageContent, role: str, idx: int

VLM Finetuning support #411

Are you sure you want to change the base?

VLM Finetuning support #411

Conversation

nikita-smetanin commented Dec 22, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants